Overview

Dataset statistics

Number of variables31
Number of observations1033
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory250.3 KiB
Average record size in memory248.1 B

Variable types

Boolean18
Categorical4
Numeric9

Alerts

First Sit is highly correlated with Second SitHigh correlation
Second Sit is highly correlated with First SitHigh correlation
Fails is highly correlated with PassHigh correlation
Pass is highly correlated with FailsHigh correlation
English is highly correlated with MathsHigh correlation
Maths is highly correlated with EnglishHigh correlation
First Sit is highly correlated with Second SitHigh correlation
Second Sit is highly correlated with First SitHigh correlation
Fails is highly correlated with PassHigh correlation
Pass is highly correlated with FailsHigh correlation
First Sit is highly correlated with Second SitHigh correlation
Second Sit is highly correlated with First SitHigh correlation
Fails is highly correlated with PassHigh correlation
Pass is highly correlated with FailsHigh correlation
Btec is highly correlated with A LevelsHigh correlation
SLC is highly correlated with Student VisaHigh correlation
Bursary is highly correlated with Polar_4_ScoreHigh correlation
desertion is highly correlated with ProgressionHigh correlation
Progression is highly correlated with desertionHigh correlation
British is highly correlated with Student VisaHigh correlation
A Levels is highly correlated with BtecHigh correlation
Polar_4_Score is highly correlated with BursaryHigh correlation
Student Visa is highly correlated with SLC and 1 other fieldsHigh correlation
UCAS is highly correlated with 25 Above and 1 other fieldsHigh correlation
25 Above is highly correlated with UCASHigh correlation
Disability is highly correlated with BursaryHigh correlation
British is highly correlated with English native Language and 3 other fieldsHigh correlation
English native Language is highly correlated with BritishHigh correlation
Polar_4_Score is highly correlated with Bursary and 1 other fieldsHigh correlation
SLC is highly correlated with British and 1 other fieldsHigh correlation
Care Leaver is highly correlated with RefugeeHigh correlation
Student Visa is highly correlated with British and 2 other fieldsHigh correlation
Refugee is highly correlated with Care LeaverHigh correlation
London Permanent Residence is highly correlated with British and 1 other fieldsHigh correlation
UCAS Points is highly correlated with EnglishHigh correlation
English is highly correlated with UCAS Points and 1 other fieldsHigh correlation
Maths is highly correlated with EnglishHigh correlation
A Levels is highly correlated with BtecHigh correlation
Btec is highly correlated with A LevelsHigh correlation
Bursary is highly correlated with Disability and 1 other fieldsHigh correlation
Attendance is highly correlated with Progression and 1 other fieldsHigh correlation
Progression is highly correlated with Attendance and 3 other fieldsHigh correlation
First Sit is highly correlated with Second Sit and 2 other fieldsHigh correlation
Second Sit is highly correlated with First Sit and 1 other fieldsHigh correlation
Fails is highly correlated with Progression and 2 other fieldsHigh correlation
No Submissions is highly correlated with First Sit and 2 other fieldsHigh correlation
Pass is highly correlated with Progression and 2 other fieldsHigh correlation
Re Takes is highly correlated with No SubmissionsHigh correlation
desertion is highly correlated with UCAS and 6 other fieldsHigh correlation
Second Sit has 216 (20.9%) zeros Zeros
Fails has 850 (82.3%) zeros Zeros
No Submissions has 423 (40.9%) zeros Zeros

Reproduction

Analysis started2022-09-02 17:51:38.244812
Analysis finished2022-09-02 17:51:55.575177
Duration17.33 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

UCAS
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
True
938 
False
95 
ValueCountFrequency (%)
True938
90.8%
False95
 
9.2%
2022-09-02T18:51:55.680593image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

25 Above
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
872 
True
161 
ValueCountFrequency (%)
False872
84.4%
True161
 
15.6%
2022-09-02T18:51:55.868537image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Disability
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
967 
True
 
66
ValueCountFrequency (%)
False967
93.6%
True66
 
6.4%
2022-09-02T18:51:55.983395image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Ethnicity
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.2 KiB
0
603 
1
430 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1033
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0603
58.4%
1430
41.6%

Length

2022-09-02T18:51:56.091210image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-02T18:51:56.217681image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0603
58.4%
1430
41.6%

Most occurring characters

ValueCountFrequency (%)
0603
58.4%
1430
41.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1033
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0603
58.4%
1430
41.6%

Most occurring scripts

ValueCountFrequency (%)
Common1033
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0603
58.4%
1430
41.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1033
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0603
58.4%
1430
41.6%

Gender
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.2 KiB
Male
639 
Female
394 

Length

Max length6
Median length4
Mean length4.762826718
Min length4

Characters and Unicode

Total characters4920
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowMale
4th rowFemale
5th rowMale

Common Values

ValueCountFrequency (%)
Male639
61.9%
Female394
38.1%

Length

2022-09-02T18:51:56.341328image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-02T18:51:56.479964image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
male639
61.9%
female394
38.1%

Most occurring characters

ValueCountFrequency (%)
e1427
29.0%
a1033
21.0%
l1033
21.0%
M639
13.0%
F394
 
8.0%
m394
 
8.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3887
79.0%
Uppercase Letter1033
 
21.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1427
36.7%
a1033
26.6%
l1033
26.6%
m394
 
10.1%
Uppercase Letter
ValueCountFrequency (%)
M639
61.9%
F394
38.1%

Most occurring scripts

ValueCountFrequency (%)
Latin4920
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1427
29.0%
a1033
21.0%
l1033
21.0%
M639
13.0%
F394
 
8.0%
m394
 
8.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1427
29.0%
a1033
21.0%
l1033
21.0%
M639
13.0%
F394
 
8.0%
m394
 
8.0%

British
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
True
650 
False
383 
ValueCountFrequency (%)
True650
62.9%
False383
37.1%
2022-09-02T18:51:56.595791image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

English native Language
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
True
566 
False
467 
ValueCountFrequency (%)
True566
54.8%
False467
45.2%
2022-09-02T18:51:56.728591image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
572 
True
461 
ValueCountFrequency (%)
False572
55.4%
True461
44.6%
2022-09-02T18:51:56.847693image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Polar_4_Score
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.2 KiB
0.0
789 
1.0
244 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3099
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0789
76.4%
1.0244
 
23.6%

Length

2022-09-02T18:51:56.958568image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-02T18:51:57.083338image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0789
76.4%
1.0244
 
23.6%

Most occurring characters

ValueCountFrequency (%)
01822
58.8%
.1033
33.3%
1244
 
7.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2066
66.7%
Other Punctuation1033
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01822
88.2%
1244
 
11.8%
Other Punctuation
ValueCountFrequency (%)
.1033
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3099
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01822
58.8%
.1033
33.3%
1244
 
7.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII3099
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01822
58.8%
.1033
33.3%
1244
 
7.9%

SLC
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
True
734 
False
299 
ValueCountFrequency (%)
True734
71.1%
False299
28.9%
2022-09-02T18:51:57.198659image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Care Leaver
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
1014 
True
 
19
ValueCountFrequency (%)
False1014
98.2%
True19
 
1.8%
2022-09-02T18:51:57.311795image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Student Visa
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
878 
True
155 
ValueCountFrequency (%)
False878
85.0%
True155
 
15.0%
2022-09-02T18:51:57.421054image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Refugee
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
1009 
True
 
24
ValueCountFrequency (%)
False1009
97.7%
True24
 
2.3%
2022-09-02T18:51:57.534670image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

London Permanent Residence
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
True
573 
False
460 
ValueCountFrequency (%)
True573
55.5%
False460
44.5%
2022-09-02T18:51:57.642831image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

UCAS Points
Real number (ℝ≥0)

HIGH CORRELATION

Distinct60
Distinct (%)5.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean108.8460794
Minimum72
Maximum168
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2022-09-02T18:51:57.800483image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum72
5-th percentile82
Q196
median104
Q3119
95-th percentile152
Maximum168
Range96
Interquartile range (IQR)23

Descriptive statistics

Standard deviation19.70299151
Coefficient of variation (CV)0.181017007
Kurtosis1.082275217
Mean108.8460794
Median Absolute Deviation (MAD)10
Skewness1.042881347
Sum112438
Variance388.2078746
MonotonicityNot monotonic
2022-09-02T18:51:58.020485image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
104105
 
10.2%
9684
 
8.1%
12847
 
4.5%
12036
 
3.5%
8036
 
3.5%
11235
 
3.4%
8435
 
3.4%
10033
 
3.2%
8833
 
3.2%
10330
 
2.9%
Other values (50)559
54.1%
ValueCountFrequency (%)
724
 
0.4%
8036
3.5%
8222
2.1%
8435
3.4%
851
 
0.1%
8610
 
1.0%
875
 
0.5%
8833
3.2%
897
 
0.7%
906
 
0.6%
ValueCountFrequency (%)
16825
2.4%
1625
 
0.5%
1608
 
0.8%
1551
 
0.1%
1538
 
0.8%
15215
1.5%
1486
 
0.6%
1464
 
0.4%
14418
1.7%
1366
 
0.6%

English
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.936108422
Minimum2
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2022-09-02T18:51:58.231680image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile3
Q14
median5
Q35
95-th percentile8
Maximum9
Range7
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.273531665
Coefficient of variation (CV)0.2580031791
Kurtosis0.9445088077
Mean4.936108422
Median Absolute Deviation (MAD)1
Skewness0.7882962078
Sum5099
Variance1.621882903
MonotonicityNot monotonic
2022-09-02T18:51:58.353285image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
5416
40.3%
4290
28.1%
6120
 
11.6%
382
 
7.9%
853
 
5.1%
751
 
4.9%
211
 
1.1%
910
 
1.0%
ValueCountFrequency (%)
211
 
1.1%
382
 
7.9%
4290
28.1%
5416
40.3%
6120
 
11.6%
751
 
4.9%
853
 
5.1%
910
 
1.0%
ValueCountFrequency (%)
910
 
1.0%
853
 
5.1%
751
 
4.9%
6120
 
11.6%
5416
40.3%
4290
28.1%
382
 
7.9%
211
 
1.1%

Maths
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.80929332
Minimum2
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2022-09-02T18:51:58.487670image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile3
Q14
median5
Q35
95-th percentile7
Maximum9
Range7
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.104707503
Coefficient of variation (CV)0.2297026672
Kurtosis1.334057792
Mean4.80929332
Median Absolute Deviation (MAD)1
Skewness0.5204563193
Sum4968
Variance1.220378667
MonotonicityNot monotonic
2022-09-02T18:51:58.602584image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
5417
40.4%
4345
33.4%
6124
 
12.0%
760
 
5.8%
346
 
4.5%
222
 
2.1%
814
 
1.4%
95
 
0.5%
ValueCountFrequency (%)
222
 
2.1%
346
 
4.5%
4345
33.4%
5417
40.4%
6124
 
12.0%
760
 
5.8%
814
 
1.4%
95
 
0.5%
ValueCountFrequency (%)
95
 
0.5%
814
 
1.4%
760
 
5.8%
6124
 
12.0%
5417
40.4%
4345
33.4%
346
 
4.5%
222
 
2.1%

A Levels
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
True
579 
False
454 
ValueCountFrequency (%)
True579
56.1%
False454
43.9%
2022-09-02T18:51:58.743628image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Btec
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
654 
True
379 
ValueCountFrequency (%)
False654
63.3%
True379
36.7%
2022-09-02T18:51:58.858128image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
True
541 
False
492 
ValueCountFrequency (%)
True541
52.4%
False492
47.6%
2022-09-02T18:51:58.972808image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Bursary
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
787 
True
246 
ValueCountFrequency (%)
False787
76.2%
True246
 
23.8%
2022-09-02T18:51:59.088737image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Attendance
Real number (ℝ≥0)

HIGH CORRELATION

Distinct63
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75.08712488
Minimum20
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2022-09-02T18:51:59.227521image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile46
Q164
median76
Q388
95-th percentile97
Maximum100
Range80
Interquartile range (IQR)24

Descriptive statistics

Standard deviation15.73841886
Coefficient of variation (CV)0.2096020974
Kurtosis-0.6273441074
Mean75.08712488
Median Absolute Deviation (MAD)12
Skewness-0.3975210015
Sum77565
Variance247.6978283
MonotonicityNot monotonic
2022-09-02T18:51:59.403817image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6034
 
3.3%
9231
 
3.0%
9529
 
2.8%
7428
 
2.7%
9627
 
2.6%
9027
 
2.6%
8127
 
2.6%
7226
 
2.5%
6525
 
2.4%
9425
 
2.4%
Other values (53)754
73.0%
ValueCountFrequency (%)
201
 
0.1%
251
 
0.1%
406
0.6%
416
0.6%
4214
1.4%
433
 
0.3%
448
0.8%
4512
1.2%
467
0.7%
4710
1.0%
ValueCountFrequency (%)
10015
1.5%
9915
1.5%
9820
1.9%
9715
1.5%
9627
2.6%
9529
2.8%
9425
2.4%
9313
1.3%
9231
3.0%
9116
1.5%

Progression
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
True
850 
False
183 
ValueCountFrequency (%)
True850
82.3%
False183
 
17.7%
2022-09-02T18:51:59.567012image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

First Sit
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.019361084
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2022-09-02T18:52:00.188792image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median4
Q35
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.304291061
Coefficient of variation (CV)0.3245020873
Kurtosis-0.7064480272
Mean4.019361084
Median Absolute Deviation (MAD)1
Skewness0.02188714218
Sum4152
Variance1.701175173
MonotonicityNot monotonic
2022-09-02T18:52:00.310473image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
3372
36.0%
4219
21.2%
6190
18.4%
5183
17.7%
236
 
3.5%
133
 
3.2%
ValueCountFrequency (%)
133
 
3.2%
236
 
3.5%
3372
36.0%
4219
21.2%
5183
17.7%
6190
18.4%
ValueCountFrequency (%)
6190
18.4%
5183
17.7%
4219
21.2%
3372
36.0%
236
 
3.5%
133
 
3.2%

Second Sit
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.818973863
Minimum0
Maximum5
Zeros216
Zeros (%)20.9%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2022-09-02T18:52:00.426704image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile3
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.256436187
Coefficient of variation (CV)0.690739
Kurtosis-0.8238500089
Mean1.818973863
Median Absolute Deviation (MAD)1
Skewness-0.002015094775
Sum1879
Variance1.578631892
MonotonicityNot monotonic
2022-09-02T18:52:00.542727image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
3348
33.7%
2232
22.5%
0216
20.9%
1199
19.3%
520
 
1.9%
418
 
1.7%
ValueCountFrequency (%)
0216
20.9%
1199
19.3%
2232
22.5%
3348
33.7%
418
 
1.7%
520
 
1.9%
ValueCountFrequency (%)
520
 
1.9%
418
 
1.7%
3348
33.7%
2232
22.5%
1199
19.3%
0216
20.9%

Fails
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5614714424
Minimum0
Maximum5
Zeros850
Zeros (%)82.3%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2022-09-02T18:52:00.654795image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.308272192
Coefficient of variation (CV)2.330077886
Kurtosis3.627648736
Mean0.5614714424
Median Absolute Deviation (MAD)0
Skewness2.21727455
Sum580
Variance1.711576127
MonotonicityNot monotonic
2022-09-02T18:52:00.771639image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0850
82.3%
252
 
5.0%
350
 
4.8%
439
 
3.8%
532
 
3.1%
110
 
1.0%
ValueCountFrequency (%)
0850
82.3%
110
 
1.0%
252
 
5.0%
350
 
4.8%
439
 
3.8%
532
 
3.1%
ValueCountFrequency (%)
532
 
3.1%
439
 
3.8%
350
 
4.8%
252
 
5.0%
110
 
1.0%
0850
82.3%

No Submissions
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.234269119
Minimum0
Maximum5
Zeros423
Zeros (%)40.9%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2022-09-02T18:52:00.889611image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.363742454
Coefficient of variation (CV)1.104898788
Kurtosis-0.06660940015
Mean1.234269119
Median Absolute Deviation (MAD)1
Skewness0.9492579822
Sum1275
Variance1.859793482
MonotonicityNot monotonic
2022-09-02T18:52:01.008773image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0423
40.9%
1253
24.5%
2165
 
16.0%
396
 
9.3%
476
 
7.4%
520
 
1.9%
ValueCountFrequency (%)
0423
40.9%
1253
24.5%
2165
 
16.0%
396
 
9.3%
476
 
7.4%
520
 
1.9%
ValueCountFrequency (%)
520
 
1.9%
476
 
7.4%
396
 
9.3%
2165
 
16.0%
1253
24.5%
0423
40.9%

Late Submission
Categorical

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.2 KiB
1
424 
0
409 
2
175 
3
 
25

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1033
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
1424
41.0%
0409
39.6%
2175
16.9%
325
 
2.4%

Length

2022-09-02T18:52:01.135597image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-02T18:52:01.269006image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
1424
41.0%
0409
39.6%
2175
16.9%
325
 
2.4%

Most occurring characters

ValueCountFrequency (%)
1424
41.0%
0409
39.6%
2175
16.9%
325
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1033
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1424
41.0%
0409
39.6%
2175
16.9%
325
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Common1033
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1424
41.0%
0409
39.6%
2175
16.9%
325
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII1033
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1424
41.0%
0409
39.6%
2175
16.9%
325
 
2.4%

Pass
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean91.6934495
Minimum16.66666667
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2022-09-02T18:52:01.391645image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum16.66666667
5-th percentile33.33333333
Q1100
median100
Q3100
95-th percentile100
Maximum100
Range83.33333333
Interquartile range (IQR)0

Descriptive statistics

Standard deviation19.74286232
Coefficient of variation (CV)0.2153137703
Kurtosis3.913579725
Mean91.6934495
Median Absolute Deviation (MAD)0
Skewness-2.287006307
Sum94719.33333
Variance389.7806127
MonotonicityNot monotonic
2022-09-02T18:52:01.501367image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
100850
82.3%
33.3333333351
 
4.9%
5050
 
4.8%
66.6666666739
 
3.8%
83.3333333332
 
3.1%
16.6666666710
 
1.0%
861
 
0.1%
ValueCountFrequency (%)
16.6666666710
 
1.0%
33.3333333351
 
4.9%
5050
 
4.8%
66.6666666739
 
3.8%
83.3333333332
 
3.1%
861
 
0.1%
100850
82.3%
ValueCountFrequency (%)
100850
82.3%
861
 
0.1%
83.3333333332
 
3.1%
66.6666666739
 
3.8%
5050
 
4.8%
33.3333333351
 
4.9%
16.6666666710
 
1.0%

Re Takes
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
878 
True
155 
ValueCountFrequency (%)
False878
85.0%
True155
 
15.0%
2022-09-02T18:52:01.633677image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

desertion
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
875 
True
158 
ValueCountFrequency (%)
False875
84.7%
True158
 
15.3%
2022-09-02T18:52:01.750037image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Interactions

2022-09-02T18:51:52.727668image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:41.901092image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:43.316547image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:44.663688image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:45.998730image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:47.410138image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:48.720816image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:50.006103image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:51.401775image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:52.867601image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:42.098078image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:43.466947image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:44.814402image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:46.146000image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:47.550192image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:48.858458image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:50.147192image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:51.535397image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:53.010760image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:42.314753image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:43.628756image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:44.968783image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:46.306661image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:47.698600image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:49.000854image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:50.298820image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:51.686989image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:53.153234image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:42.477006image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:43.778737image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:45.130433image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:46.472620image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:47.855670image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:49.148972image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:50.469671image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:51.849761image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:53.311130image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:42.632103image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:43.939781image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:45.286978image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:46.644767image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:48.015575image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:49.301183image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:50.649446image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:52.009902image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:53.448115image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:42.769617image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:44.080737image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:45.432076image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:46.799526image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:48.156702image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:49.437477image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:50.801552image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:52.152690image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:53.582870image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:42.909673image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:44.222465image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:45.572355image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:46.951756image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:48.301543image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:49.573207image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:50.946836image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:52.311783image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:53.726755image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:43.045704image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:44.367681image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:45.714637image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:47.103536image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:48.441566image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:49.710922image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:51.082685image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:52.451836image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:53.869959image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:43.185023image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:44.517629image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:45.859361image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:47.264928image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:48.582648image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:49.863930image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:51.266824image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-02T18:51:52.589027image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-09-02T18:52:01.854770image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-02T18:52:02.066143image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-02T18:52:02.276853image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-02T18:52:02.508721image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-02T18:52:02.838792image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-02T18:51:54.141989image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-02T18:51:55.365785image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

UCAS25 AboveDisabilityEthnicityGenderBritishEnglish native LanguageParent He attendancePolar_4_ScoreSLCCare LeaverStudent VisaRefugeeLondon Permanent ResidenceUCAS PointsEnglishMathsA LevelsBtecPrevious workBursaryAttendanceProgressionFirst SitSecond SitFailsNo SubmissionsLate SubmissionPassRe Takesdesertion
0nonono1Malenonoyes0.0nonoyesnoyes98.05.04.0yesnoyesno86yes33.0022100.000000yesno
1nonono0Malenonoyes1.0yesnononono101.05.05.0yesnoyesyes55no12.053083.333333noyes
2nonono1Maleyesyesyes0.0yesnononoyes129.04.04.0yesnoyesno57yes60.0000100.000000noyes
3noyesno0Femalenonono0.0yesnononoyes110.09.08.0yesnoyesno48yes60.0000100.000000noyes
4nonono0Maleyesyesyes0.0yesnononoyes130.06.05.0yesnoyesno83yes42.0020100.000000nono
5yesnono1Maleyesyesyes0.0yesnononoyes112.06.04.0noyesnono71yes33.0001100.000000nono
6yesnono0Malenoyesno0.0nonoyesnono89.06.05.0yesnonono96yes42.0002100.000000nono
7yesnono0Maleyesyesno0.0yesnononoyes103.04.05.0yesnonono67yes33.0030100.000000nono
8yesnono0Maleyesyesno1.0nonononoyes128.04.04.0noyesnoyes89yes60.0000100.000000nono
9yesnono0Femaleyesyesno0.0nonononono91.04.04.0nononono92yes60.0011100.000000nono

Last rows

UCAS25 AboveDisabilityEthnicityGenderBritishEnglish native LanguageParent He attendancePolar_4_ScoreSLCCare LeaverStudent VisaRefugeeLondon Permanent ResidenceUCAS PointsEnglishMathsA LevelsBtecPrevious workBursaryAttendanceProgressionFirst SitSecond SitFailsNo SubmissionsLate SubmissionPassRe Takesdesertion
1023yesnono1Malenonoyes0.0yesnononono107.06.07.0noyesnono96yes60.0001100.000000nono
1024yesyesno0Maleyesyesno0.0yesnononono103.05.06.0noyesyesno67yes15.0030100.000000nono
1025yesnono0Maleyesyesyes0.0yesnononono100.05.04.0yesnoyesno70yes60.0001100.000000nono
1026yesyesno0Femalenonoyes0.0yesnononono113.03.06.0noyesnono64yes33.0021100.000000nono
1027yesyesno0Maleyesnoyes1.0yesnononoyes118.05.05.0yesnoyesyes96yes33.0010100.000000nono
1028yesnono1Femalenoyesno1.0yesnononono102.04.04.0yesnoyesno55yes60.0001100.000000noyes
1029yesnono0Maleyesyesyes0.0yesnononoyes109.04.04.0yesnoyesno66yes60.0000100.000000nono
1030noyesno1Femalenonono1.0nonononono104.06.05.0yesnoyesno42no11.024133.333333yesyes
1031noyesno1Malenonoyes0.0yesnononono101.06.06.0noyesnono60yes60.0000100.000000nono
1032noyesno1Femalenonono0.0nonoyesnono104.08.04.0nononono71yes60.0000100.000000nono